Spark Whitepaper

Created by	Patrick Woodhead
Created time	@September 4, 2024 8:25 PM
Last edited by	Julian Gruber
Last edited time	@September 5, 2024 12:53 PM
Tags

Abstract

Many Web3 storage networks have been built with proofs of durability. For example Filecoin built PoRep. However, noone yet that we know of has developed a proof of retrieval protocol. There are even some impossibility results showing that a proof of retrieval is not attainable. However, these impossibility results focus on the idea that a client and provider wish to prove that a specific retrieval has taken place. The Spark protocol adopts a different method in order to avoid the self dealing and Sybil attack vectors that have plagued this space. In each retrieval task in the Spark protocol, it randomly samples the client and the file, and therefore the provider, and it only rewards participants for the randomly chosen tasks. In this way, self dealt tasks and sybil tasks are not evaluated by the protocol and do not affect the outcome. Furthermore, since the tasks are randomly chosen, even though not all data will be retrieved by Spark, the provider must make all data available for retrieval as it doesn’t know what data Spark will chose next to test.

Introduction

todo

Background

Definitions

Instantiator

The Instantiator is the network or Storage Providing service that wishes to deploy the Spark protocol to check the retrievability of their data. Examples include The Filecoin Network and the Storacha Network.

Checker

The checkers are the nodes/entities that are making retrieval requests according to the rules of the protocol

Provider

The provider is the entity that will be serving the retrieval requests back to the checker. There can be one or many providers in a network.

Orchestrator

The Spark orchestrator is the code in the protocol that forms committees, assigns tasks and collects measurements.

Model

todo

Protocol

In this section, we will go into depth on the Spark protocol. We have split the protocol into two steps. The first is the setup. This is what needs to happen before the protocol starts running “rounds”. The second is the per round protocol. This is what happens on repeat once the protocol starts.

Setup

The Spark Smart Contract

The smart contract that governs the Spark protocol must be deployed for each instantiation. The smart contract governs the measurements, evaluations and rewards as well as the advancement of rounds.

Funding

The instantiator must add funds to the smart contract in order to incentivise the checker nodes to run the checks. These funds can be distributed between the checkers and providers according to the wishes of the instantiator.

Task List Generation

The instantiator must provide a list of tasks that it would like the Spark protocol to test. As we will see in the Per Round section, the Spark protocol randomly samples tasks and assigns them to checkers in each round of the protocol.

Per Round

Committee Formation

At the start of each Spark round, the protocol uses a randomness beacon to deterministically-at-random group the online Spark checkers into several committees.

The members of each committee will make the same retrieval checks during the round. The committee will then come to an honest majority consensus about the result of the retrieval.

There are a few reasons why the protocol randomly builds these different committees in each round:

By building the committees randomly, the protocol protects against one party controlling an entire committee.

In each round, the protocol can assign different checks to each committee as opposed to all checkers making the same requests. This prevents one provider from receiving too many of the same request in each round and becoming overloaded.

Committees reduce the ways in which checkers can act fraudulently. By coming to an honest majority consensus on the result of the retrieval, checkers are held to account by the others in their committee: checkers can only earn rewards if they are part of the majority.

Assignment of Tasks

In each round, following the formation of committees, the protocol deterministically-at-random assigns each committee a set of x tasks to test. The checkers in each committee can choose which order they wish to complete their tasks. They just need to complete the tasks during the round in order to be included in the evaluation and rewards for those checks.

Retrieval Test

A task is of the form (Filename or CID, Provider). The checker makes a retrieval request to the provider for the file or CID.

Measurement Submission

The checker reports back to the protocol orchestrator whether or not the retrieval was successful, and if not, what error occurred.